Characterization of Randomized Shuffle and Sort Quantifiability in MapReduce Model

نویسندگان

  • Saikat Mukherjee
  • Ravi Prakash
  • Rajagopal Ananthanarayanan
  • Karan Gupta
  • Prashant Pandey
  • Himabindu Pucha
  • Prasenjit Sarkar
  • Mansi Shah
  • Mark Dredze
  • Alex Kulesza
  • Fay Chang
  • Jeffrey Dean
  • Sanjay Ghemawat
  • Wilson C. Hsieh
  • Deborah A. Wallach
  • Michael Burrows
  • Tushar Chandra
  • Andrew Fikes
  • Arthur Asuncion
  • Padhraic Smyth
  • Brian F. Cooper
  • Adam Silberstein
  • Erwin Tam
  • Raghu Ramakrishnan
  • Olivier Bousquet
  • Giuseppe DeCandia
  • Deniz Hastorun
  • Madan Jampani
  • Gunavardhan Kakulapati
  • Avinash Lakshman
  • Alex Pilchin
  • Swami Sivasubramanian
  • Peter Vosshall
چکیده

Quantifiability is a concept in MapReduce Analytics based on the following two conditions: (a) a mapper should be cautious, that is, should not exclude any reducer's shuffle and sort strategy from consideration; and (b) a mapper should respect the reducers' shuffle and sort preferences, that is, should deem a reducer's shuffle and sort strategy ki infinitely more likely than k'i if it premises the reducer to prefer ki to k'i. A shuffle and sort strategy is quantifiable if it can optimally be chosen under common shuffle and sort conjecture in the events (a) and (b). In this paper we present an algorithm that for every finite MapReduce operation computes the set of all quantifiable shuffle and sort strategies. The algorithm is based on the new idea of a key-value preference limitation, which is a pair (ki, Vi) consisting of a shuffle and sort strategy ki, and a subset of shuffle and sort strategies Vi, for mapper i. The interpretation is that mapper i prefers some shuffle and sort strategy in Vi to ki. The algorithm proceeds by successively adding key-value preference limitations to the MapReduce.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Traffic Analysis in MapReduce

-MapReduce is a programming model, which can process the large set of data and produces the output. The MapReduce contains two functions to complete the work, those are Map function and Reduce function. The Map function will get assign fragmented data as input and then its emit intermediate data with key and send to this intermediate data with key to the Reducer, where Reducer will get the inpu...

متن کامل

Asymmetric Key-Value Split Pattern Assumption over MapReduce Behavioral Model

Actual Quantifiability is a concept in MapReduce that is based on two assumptions: (1) every mapper is cautious, i. e. , does not exclude any reducer's key-value split pattern choice from consideration, and (2) every mapper respects the reducer's key-value split pattern preferences, i. e. , deems one reducer's key-value split pattern choice to be infinitely more likely than anoth...

متن کامل

Optimization and analysis of large scale data sorting algorithm based on Hadoop

When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of big data sorting is to use shuffle and sort phase in MapReduce based on Hadoop. However, if we use it directly, the efficiency could be very low and the loa...

متن کامل

MapReduce with communication overlap (MaRCO)

MapReduce is a programming model from Google for cluster-based computing in domains such as search engines, machine learning, and data mining. MapReduce provides automatic data management and fault tolerance to improve programmability of clusters. MapReduce’s execution model includes an all-map-to-all-reduce communication, called the shuffle, across the network bisection. Some MapReductions mov...

متن کامل

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

MapReduce and Spark are two very popular open source cluster computing frameworks for large scale data analytics. These frameworks hide the complexity of task parallelism and fault-tolerance, by exposing a simple programming API to users. In this paper, we evaluate the major architectural components in MapReduce and Spark frameworks including: shuffle, execution model, and caching, by using a s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013